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1. INTRODUCTION 

Determining facial area is one of the things of great complexity and difficulty within the fields of 
computer vision, because the images are of a large size and because people usually undergo some changes to 
the appearance of their faces, suggestive expressions, lighting, and changing the angle of the face [1], [2]. 
In the era of information technology and multimedia, digital images and video play a significant role, and the 
vast amount of data and visual information requires expertise to provide search, detection, indexing, and 
other items with techniques and equipment [3]. The human face is also one of the most unique features that 
can be used in the database of images or videos. Wang and Chang [3] because it is considered a strong and 
distinctive feature of the person to be distinguished or identified, especially in the last few years due to the 
emergence of infectious diseases and global epidemics that caused the use of devices and techniques for 
recognition remotely [4]. 

In 2003, Verma et al. used propagating detection probabilities in a video sequence for face detection 
and tracking [5]. While Eleftheriadis and Jacquin proposed a technique for face location and tracking of 
video teleconferencing sequences for model-assisted coding at low bit rates [6]. A combination of the 
tracking system used for enhancement with a static lighting method to determine the face in the video was 
also suggested by Ku’blbeck, and Ernst using the changing transformation of the census [7]. An algorithm to 
track the human front face in the video has also been proposed using dynamic programming iterative (DP) by 
Liu and Wang [8]. Zhao and others proposed a human counting system in a video focused on face detection 
and tracking which accomplish face monitoring by integrating a new invariant scale Kalman Kernel-based 
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tracking algorithm filter [9], [10]. Albiolt et al. proposed a tool to discover the human face in the video using 
improvements made to an algorithm that relies on finding homogeneous areas similar to skin in humans [11]. 
A comparison between the proposed works with all these studied researchin the same field can be viewed in 
Table 1. 


Table 1. Comparison between the proposed work with the all mentioned research 


No. Paper Year Algorithm Data set Recognition rate _ Wrong rate 

1 [5] 2003 Prediction and update-tracking UMIST database 88% 12% 

2 [6] 1995 Geometrical method Live data set 90% 10% 

3 [7] 2006 Census transformation Live data set 90.3% 8% 

4 [8] 2000 Templete matching Live data set 82% 18% 

5 [9] 2014 kernel based tracking Live data set 93% 1% 

6 [10] 2011 Normalized color coordinates Live data set 80% 20% 

7 {11] 2000 Skin detection ViBE video Good Not bad 
8 Proposed --- Developing Viola Jones' face Mathlab language database+ 99.3% 0.7% 

detection Live data set+ YouTube 


For the last five years, other ideas are accomplished. The focus on artificial intelligence (AI) 
algorithms has become noticeable. In particular, the use of neural networks of various kinds and deep 
learning. One of these studies is the use of the neural aggregation network algorithm [12], convolutional 
neural networks (CNN), and deep convolutional neural networks, the following is a comparison between 
these studies and the proposed method as shown in Table 2. 


Table 2. Comparision for latest five years in the same region 
No. Paper Year Algorithm Data set Recognition rate Wrong rate 
[12] 2017 +Neural aggregation network IARPA (intelligence advanced 95.72% 94.38% 
research projects activity) Janus 
Benchmark A (IJB-A) 


= 


2: [13] 2020 Real-time video processing 82% 18% 
3 [14] 2017 Convolutional neural PaSC (point and shoot face 94.96% 6% 
Networks recognition challenge), COX 
(Face), and YouTube 
4 [15] 2019 Component-wise feature YouTube Faces, IJB-A (IARPA 96.50% 4.50% 
aggregation network Janus Benchmark A), and JB-S 
(IARPA Janus Surveillance) 
5 [16] 2019 A deep convolutional Live data set 92.5% 8.5% 
neural network 
6 Proposed --- Developing Viola Jones' Mathlab language database+ Live 99.3% 0.7% 
face detection dataset+ YouTube 


This research aims to discover an effective method for determining the area of the human face in 
video sequences by using the important improvements made to Viola Jones' face detection algorithm [17] 
that determines the facial area in digital images while not allowing the loss of the facial area to occur. This 
method can be used in many important applications such as browsing the video, identifying the face of 
people in the videos for crime detection, and indexing the video. A full article usually follows a standard 
structure: 1. Introduction, 2. Viola Jones’ face detection algorithm, 3. Normalized cross-correlation (NCC), 
4. Template matching using manhattan distance measure for 2_dimention, 5. Proposed method, 6. Results, 
and 7. Conclusion. 


2. VIOLA JONES' FACE DETECTION ALGORITHM 

In 2001 Viola with Jones presented a method for fast and accurate facial identification, where 
the speed was 15 times faster than any other known technology, with an accuracy of 95% [1], [18]. 
The method works on what is known as integrated images on the gray image quality so that the face pattern 
in the image is recognized, the integral image, AdaBoost, as well as the cascade structure, are the three 
primary concepts that allow it to execute in real time, an integral image is an algorithm for generating 
the sum of pixel intensities in a given rectangle in an image cost-effectively. It is used for Haar-like 
characteristics to be easily computed [19] by using the integral rectangle to measure a feature's value, 
Haar-like consists of (two or more) rectangles, each element of the image contains the sum of all the pixel 
values in the upper left and this allows a constant time to add all of the regional random rectangles using 
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the "AdaBoost" algorithm, the areas classified as "Weak" are determined, and each of them is considered 
the sum of the pixels used to compare the rectangular region [20], [21]. AdaBoost has been used as a linear 
combination of weak classifiers for the construction of strong classifiers, and it reduces redundant features. 
The algorithm defines a small number of critical and visible properties to produce highly efficient classifiers 
at high speed. It also contributes to merging the sequential classifiers so that the background of the image is 
quickly discarded [19]. After calculating the quotient by subtracting the total of white pixels from the total of 
black pixels, one value is obtained, and this value when it is more in a specific area of the image means that 
this area belongs to one of the parts of the face [22]. And instead of "AdaBoost" summarizing all 
the characteristics of the pixels in all regions, it uses the integral image. Then it identifies related and 
unrelated features. So, a strong classifier is constructed as a linear combination of weak classifiers [1], [2], 
[22] it is further possible to reduce the number of calculations by cascading, to increase computational 
efficiency dramatically as well and decrease the false positive rate where it begins to calculate the input 
window within the first classifier in the cascade. If an error is returned, then this window will end and 
the detector will return by false, but if it returns the correct one, the window moves to the next classifier in 
the cascade, and a selection of features is stored in another classifier set in a cascading style and so on, if 
the window passes through all the classifiers, then the detector returns true. Through this approach, if one 
classifier lacks to provide the requisite output to the next level, one can discern whether or not it is a face ina 
quicker time or can reject it [18], [23], Figure 1 shows the method and, examples of Haar characteristics used 
in the algorithm of Voila-Jones are as seen in Figure 2 [2], [22]. 


True 


Second Third 
Input all First classifier classifier classifier 


Face 


windows ee Region 
io: 


Reject Fail windows 


Figure 1. Cascade of the stages (1, 2, 3, 4, ....). The candidate window must succeed in all stages to represent 
part of the face at the end 


qd) (2) (3) 
(4) (5) 


Figure 2. An examples of Haar-like features types used in the Voila-Jones algorithm where feature may 
indicate from border which lies in between a dark and light region 


3. NORMALIZED CROSS-CORRELATION 

It is a well-known algorithm, which is the measure of similarity between two groups of features 
compared with each other, and it is used to measure the extent of similarity between entities of congruence in 
one image with what they correspond to from entities in another image [24]. The problem of finding two 
images matching only by matching the points of interest is an important computer vision, and the algorithm 
NCC is one of the algorithms that is widely used in many techniques and applications that rely on matching 
parts of those images [25]. The major benefit of neural network convolution (NNC) is that the two 
comparative images are less sensitive to linear variations in the illumination amplitude, Also the range 
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between -1 and | is restricted to it, furthermore, It is much easier to set the detection threshold value and it 
does not have a simple expression of a frequency domain. It is also unable to directly use fast fourier 
transform (FFT) in computing NNC, which is more successful with the spectral field, so as the size of the 
window of the template gets bigger, its computing time increases drastically [26]. The following equation can 
be used for measuring matches between the element value of template t(i,j) with size axb with the element 
f(x,y) of the image with size AxB while template size axb is smaller than image size AxB [26], 


a b 
Lina, ib y fxtiyt)tG)-abupue 


Lc.) Se (1) 
2 2 2 ; i) 2 iz iz 20. 7)— 2 
(x a), 2ia-b/,f (x+i,y+j) a.b.we).C: a), 2i-b/, t (ij) anu) 


i=- i=- 


Where all (x,y) € AxB, 
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4. TEMPLATE MATCHING USING MANHATTAN DISTANCE MEASURE FOR 2 DIMENTION 

The template matching method is very popular and practical and is used in many techniques and 
applications in identifying different objects, as it gives high accuracy in pattern recognition in addition it does 
not take a long implementation time compared to other techniques [27]. The manhattan method is the 
technique that provides the true distance between an image pixel and the theoretical distance is the Manhattan 
technique of calculating the distance between two points. The distance gives the nearest approximation. By 
using the Manhattan equation, the distance between the template point and the image window point is 
determined. Blackmar et al. and Jiang et al. [28], [29], 


d(x,y) = X1x; — yil (4) 


5. PROPOSED METHOD 

The paper introduces a method for introducing a system for recognizing faces in videos by 
hybridizing Viola Jones' face recognition algorithm with the algorithm NCC in addition to using the template 
matching method with the matching equation (Manhattan distance measure for 2_dimentions) to identify the 
face in each of the used video frames, where these three algorithms were combined so that the faces were 
identified quickly and efficiently at the same time, also the recognition rate increased by adding fuzzy logic 
for the recognition system. The work can be summarized in Figure 3. 


Read Video 
Data 
(Input) 


Convert Video to 
Frames Per Second 
(Convert to images) 


Implement Developing Viola 
Jones' Face Detection Algorithm 
FRDVJ and detecting all faces 


Implement 
fuzzy 
matching 


Writing Video Data of the 
frames and Save It 

(Convert images with detecting 

faces to video) 


Save Results of Frames 
(Output images) 


Figure 3. Flowchart of the proposed system 
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5.1. Pre, post_ processing 

The videos used in the proposed research are two types of videos taken from the Mathlab language 
database for the examples found in the help section, and the other type was captured by a regular mobile 
camera at different periods. At this stage, a set of processes that precede the process of recognizing the face 
in the video are carried out, namely: The process of reading the video clip, and splitting the video clip into 
frames per second. After that, a preliminary treatment process is carried out for each frame separately, which 
is: the process of converting the frame colors from the red, green and blue (RGB) color system to the gray 
color system, then changing the size to a uniform size for all frames (images) which is equal to 500x500 
pixel. The second part of this stage takes place after the process of applying the developed algorithm for 
facial recognition to the frames that were initially processed, where in this part a Postprocessing is attached to 
the frames, which is stored in the results file and converted into a video again and stored in a file of type 
audio video interleave (AVI). 


5.2. Face recognition using developing Viola Jones' face detection algorithm 

After the two processes of data collection and preliminary processing that took place on this data in 
all its stages, the stage of distinguishing and defining the face area begins with each frame. The proposed 
algorithm face recognition using developing Viola Jones’ detection algorithm (FRDVJ) is implemented at 
this point on the frames extracted from the video clip, where it scans and recognizes the face in the frame. 
Figure 4 summarizes this algorithm. 


1. Begin 

2. While (max number of frame <> 0) 

3. Begin 

4, Find the first frame that has a face region by implementing the viola 


jones algorithm and give it value as frame= 1. 
5. If frame==1 do 
A. Implement viola jones algorithm. 
B. Crop the face region and save it in the im matrix. 
6. Else 
A. Implement viola jones algorithm. 
B. Calculate the number of regions as the face in the count value. 
C. If count=0 then 

1) Implement the NCC algorithm to find High Peak as the face 
region. 

2) Crop the highest peak region. 

3) Implement the Manhattan technique between the face 
template of the previous frame and the current region 
frame. 

D. Else if count >1 

1) Crop all regions that represent a face. 

2) Implement Manhattan technique between face template of the 
previous frame and current regions frame and return 
distances: Dl, D2,..Dn 

3) Find max D, and return the face region of it. 


E. End if 

F. End if 

G. Exchange the im matrix with the current face region. 
7. End if 
8. End while. 
9. End. 


Figure 4. Pseudo code of FRDVJ algorithm 


5.3. Modifying suggestion algorithm with fuzzy matching 

Fuzzy matching, also known as approximate pattern matching, is a method for finding two text, 
string, or entry components that are roughly similar but not identical. That is why it was chosen as an 
intelligent method that depends on guesswork, using inference based on the principle (if_then) rules to 
increase the power of discrimination depending on reaching the best possible optimal solution as each image 
is considered to have a similarity with the image of the face to be reached by more than 60% is the image of a 
face and as shown in Figure 5. 
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=Face 


>60% Matching 
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Figure 5. Modifying the proposed algorithm with fuzzy matching 


6. RESULTS 

After implementing the proposed algorithm on all of the specially designed videos to examine the 
results of such techniques and from the results of a set of 277 frameworks, note that the results of using the 
hybrid algorithm FRDVJ, which appear in Table 3, Viola Jones’ face identification algorithm gave great 
results after it was hybridized and combined as in the proposed algorithm spatially because of using 
Manhattan equation. 

Two basic measures were used to determine the results: Right detection rate (RDR) [30], [31] and 
wrong detection rate (WDR) [30], [32], The results are calculated on the first scale by calculating the number 
of frames in which the face has been successfully identified to the number of total frames. The higher the 
value of this measure, the better the method of identifying the face in the frame would be. The equation for 
this scale can be seen, 


no.of correctly detected face in frame 


RDR = 


* 100 (5) 


total no.of frames 


The second scale is a measure of the algorithm’s inability to find and determine the area of the 
face [4]. Therefore, it represents the ratio of the number of frames in which the face could not be found to the 
total number of frames. Therefore, the lower the score of this scale, the greater the degree of efficiency of the 
system in finding areas of the face, the scale equation can be computed [4], 


no.of correctly detected face in frame 


WDR = « 100 (6) 


total no.of frames 


Table 3. Results of face recognition in video frames using the proposed method 


No. Video name No. of frame No. of frame correct detecting No. of frame wrong detecting | RDR WDR 
1 Face_ uncomp 31 31 0 100% 0% 
2 vipcolorsegmentation 86 86 0 100% 0% 
3 visionface 65 63 2 96.9% 3.07% 
4 Face_moviel 40 38 2 95% 5% 
5 Face_movie2 55 52 3 94.5% 6.5% 
6 YouTube movie 30 27 3 90% 10% 

Total 6 307 297 10 96.7% 3.3% 


After that, the algorithm was modified by adding the possibility of fuzzy logic to it, where the 
results are tested and as shown in Table 4, where there is increase in the accuracy of the results obtained,and 
this is shown at the end of the table in the final outcome. A great result obtained after using fuzzy logic in 
recognized faces, where the discrimination rate reached 99.3% due to the use of the speculative property in 
the proposed algorithm. 


Table 4. Results using fuzzy matching 


No. Video name No. of frame No. of frame correct detecting No. of frame wrong detecting |RDR =WDR 
1 Face_ uncomp 31 31 0 100% 0% 
2 vipcolorsegmentation 86 86 0 100% 0% 
3 visionface 65 64 1 98.5% 15% 
4 Face_moviel 40 40 0) 100% 0% 
5 Face_movie2 55 55 0 100% 0% 
6 YouTube movie 30 29 1 96.6% 34% 

Total 5 307 305 2 99.3% 0.7% 
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7. CONCLUSION 

It becomes clear to us how important the resulting combination of the Viola_Jones algorithm and the 
new algorithm in addition to it matches known templates using the Manhattan equation. As can be seen from 
the results table of adding the fuzzy logic algorithm and using it in the complementary hybridization of the 
hybrid algorithm that the recognition and identification of faces in the video clips became more accurate and 
the results became 99.3% identical. This is because the fuzzy logic brings the result of recognition close to 
the face without the possibility of error. After all, any part of the face that has been reached by more than 
60% is considered the part of the face that is required to be recognized in the video. In the future, it is 
possible to examine more different video cases with different dimensions from the camera and with other 
movements, so that the algorithm is better examined and ensured that it can reach the person’s face quickly 
and directly to be exploited in several areas. 
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